# Vision-Language Interaction
Qwen2.5 VL 7B Instruct Q8 0 GGUF
Apache-2.0
This model is a GGUF-format conversion of Qwen2.5-VL-7B-Instruct, supporting multimodal tasks and applicable to image and text interaction processing.
Text-to-Image English
Q
cxtb
72
1
Magma 8B
MIT
Magma is a foundational multimodal AI agent model capable of processing image and text inputs to generate text outputs, with complex interaction abilities in both virtual and real-world environments.
Image-to-Text
Transformers

M
microsoft
4,526
363
Qwen2.5 VL 3B Instruct MLX 8bits
This is an 8-bit quantized version of the Qwen2.5-VL-3B-Instruct model, optimized for the MLX framework and supports image-text generation tasks.
Image-to-Text
Transformers English

Q
moot20
27
1
AURORA
MIT
AURORA is a video and simulation-based action and reasoning-centric image editing model, focusing on vision-language tasks.
Image Generation English
A
McGill-NLP
81
4
Llava Meta Llama 3 8B Instruct
A multimodal model integrating Meta-Llama-3-8B-Instruct and LLaVA-v1.5, providing advanced vision-language understanding capabilities
Image-to-Text
Transformers

L
MBZUAI
20
11
Internlm Xcomposer2 Vl 7b
Other
InternLM-XComposer2 is a vision-language large model developed based on InternLM2, featuring outstanding image-text understanding and creation capabilities.
Text-to-Image
Transformers

I
internlm
1,902
82
Instructblip Vicuna 7b 8bit
InstructBLIP-Vicuna-7B is a vision-language model based on Vicuna-7B, supporting image-to-text conversion tasks.
Image-to-Text
Transformers

I
Mediocreatmybest
24
3
Featured Recommended AI Models